multi-head attention
multi-head
attention
https://gyazo.com/7e9cf764869843934cc93708186da2cf
self-attention
(
scaled dot-product attention
)を並列に複数並べたもの
self-attentionの出力を
concat
して全結合層へ送る
#Transformer
#BERT